You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Split Document into Collection (Operator Toolbox)
Synopsis
This operator splits a document (for example from Read Document) into a collection of documents, according to the split string parameter.Description
This operator receives a document at its input port and splits it into a collection of documents, according to the split string parameter. The input document can originate for example from a Read Document operator, which reads in a complete text file and provides the content of the file as one document. You can use the Split Document into Collection operator to split this document into a collection and process it one by one. For example if you want to process a file line by line, you can use the end of line character (''\n'') as the split string.
The split documents are also converted into an ExampleSet with one attribute, containing the documents.
Input
- document
The input document.
Output
- collection (Collection)
The resulting collection of documents.
- example set (Data Table)
An ExampleSet containing the split documents as an attribute. Each document is one example.
Parameters
- split_string String on which the input document is split. The split string is not included in the resulting documents. Range:
Tutorial Processes
Use Split Document into Collection to process a document line by line
This tutorial process illustrate how to use the Split Document into Collection operator to process a larger document line by line. A Create Document operator is used to create an example document, containing multiple lines of data. The Split Document into Collection operator is used to split the input document into a collection with one document per line. The JSON to Data operator converts this collection into an ExampleSet.